Chapter 17: Tree Basics and Terminology
Understanding Trees Through a Real Problem
Understanding Trees Through a Real Problem
Before diving into abstract definitions, let's encounter a tree structure in its natural habitat. Consider this real-world scenario: you're building a company organization chart system that needs to answer questions like "How many people report to this manager?" or "What's the longest chain of command?"
Here's actual organizational data:
CEO
βββ VP Engineering
β βββ Engineering Manager A
β β βββ Senior Engineer 1
β β βββ Senior Engineer 2
β βββ Engineering Manager B
β βββ Junior Engineer 1
βββ VP Sales
βββ Sales Manager
β βββ Sales Rep 1
β βββ Sales Rep 2
βββ Sales Rep 3
Let's try to represent this with what we know so farβnested dictionaries:
# Attempt 1: Nested dictionaries
org_chart = {
"CEO": {
"VP Engineering": {
"Engineering Manager A": {
"Senior Engineer 1": {},
"Senior Engineer 2": {}
},
"Engineering Manager B": {
"Junior Engineer 1": {}
}
},
"VP Sales": {
"Sales Manager": {
"Sales Rep 1": {},
"Sales Rep 2": {}
},
"Sales Rep 3": {}
}
}
}
# Try to count total employees
def count_employees(org):
"""Count all employees in the organization"""
count = 1 # Count this person
for subordinate, their_reports in org.items():
count += count_employees(their_reports)
return count
print(f"Total employees: {count_employees(org_chart)}")
Python Output:
Total employees: 11
This works! But let's try something more complexβfinding the longest reporting chain:
def max_reporting_depth(org):
"""Find the longest chain of command"""
if not org: # No subordinates
return 0
max_depth = 0
for subordinate, their_reports in org.items():
depth = 1 + max_reporting_depth(their_reports)
max_depth = max(max_depth, depth)
return max_depth
print(f"Longest reporting chain: {max_reporting_depth(org_chart)}")
Python Output:
Longest reporting chain: 4
Now let's try to find a specific person:
def find_person(org, target_name):
"""Find if a person exists in the organization"""
for name, reports in org.items():
if name == target_name:
return True
if find_person(reports, target_name):
return True
return False
print(f"Is 'Senior Engineer 1' in org? {find_person(org_chart, 'Senior Engineer 1')}")
print(f"Is 'CEO' in org? {find_person(org_chart, 'CEO')}")
Python Output:
Is 'Senior Engineer 1' in org? True
Is 'CEO' in org? False
Diagnostic Analysis: Understanding the Failure
Waitβthe CEO is definitely in the organization! Let's trace what happened:
The complete execution trace:
find_person({"CEO": {...}}, "CEO")
β Loop through items: name="CEO", reports={...}
β Is "CEO" == "CEO"? YES!
β But wait... we're checking the KEY, not the root
β The function signature assumes the root is implicit
β We never check if the root itself matches
Root cause identified: Our dictionary representation conflates the container with the content. The CEO is the key of the outer dictionary, but our function only checks keys of subordinates.
Why the current approach is problematic:
1. The root node is treated differently from all other nodes
2. We can't easily store additional data about each person (title, salary, hire date)
3. The structure is awkward: {"name": {"subordinate_name": {...}}}
4. We're mixing the person's identity (name) with their relationships (reports)
What we need: A clear separation between: - Node data (the person's information) - Node relationships (who reports to whom) - Tree structure (how nodes connect)
This is exactly what tree data structures provide. Let's build one properly.
Tree Terminology: The Language of Hierarchies
Tree Terminology: The Language of Hierarchies
Before we fix our organization chart, we need to establish precise vocabulary. Trees have specific terminology that makes discussing them unambiguous.
Core Concepts
A tree is a hierarchical data structure consisting of nodes connected by edges, with these properties: - One root node at the top (no parent) - Every other node has exactly one parent - Nodes can have zero or more children - No cycles (you can't follow edges and return to where you started)
Let's visualize this with our organization:
[CEO] β Root (no parent)
|
ββββββββ΄βββββββ
| |
[VP Eng] [VP Sales] β Children of CEO, Parents of their reports
| |
βββββ΄ββββ βββββ΄ββββ
| | | |
[EM A] [EM B] [SM] [SR3] β Some are parents, some are leaves
| | |
βββ΄ββ [JE1] ββ΄β
| | | |
[SE1][SE2] [SR1][SR2] β Leaves (no children)
Essential Terminology
Node: A single element in the tree containing data - In our example: each person is a node
Edge: A connection between a parent and child - The lines connecting people in the org chart
Root: The topmost node with no parent - The CEO in our example - Every tree has exactly one root
Parent: A node with children below it - VP Engineering is the parent of both Engineering Managers
Child: A node directly connected below another node - Engineering Manager A is a child of VP Engineering
Siblings: Nodes that share the same parent - Engineering Manager A and Engineering Manager B are siblings
Leaf (or Terminal Node): A node with no children - Senior Engineer 1, Senior Engineer 2, Junior Engineer 1, etc. - Also called "external nodes"
Internal Node: A node with at least one child - CEO, VP Engineering, Engineering Manager A, etc. - All non-leaf nodes
Ancestor: Any node on the path from a node up to the root - For Senior Engineer 1: ancestors are Engineering Manager A, VP Engineering, and CEO
Descendant: Any node on the path from a node down to leaves - For VP Engineering: all engineers and managers under them
Subtree: A node and all its descendants - The entire Engineering department is a subtree rooted at VP Engineering
Depth (or Level): Distance from the root - CEO is at depth 0 - VPs are at depth 1 - Managers are at depth 2 - Engineers/Reps are at depth 3
Height of a node: Longest path from that node down to a leaf - CEO has height 3 (CEO β VP β Manager β Engineer) - Engineering Manager B has height 1 (Manager β Engineer) - All leaves have height 0
Height of the tree: Height of the root node - Our org chart has height 3
Let's verify these concepts with code:
# Let's manually trace depth and height for our org chart
# Depth (distance from root):
depths = {
"CEO": 0,
"VP Engineering": 1,
"VP Sales": 1,
"Engineering Manager A": 2,
"Engineering Manager B": 2,
"Sales Manager": 2,
"Sales Rep 3": 2,
"Senior Engineer 1": 3,
"Senior Engineer 2": 3,
"Junior Engineer 1": 3,
"Sales Rep 1": 3,
"Sales Rep 2": 3
}
# Height (longest path to any leaf):
heights = {
"CEO": 3, # CEO β VP β Manager β Engineer
"VP Engineering": 2, # VP β Manager β Engineer
"VP Sales": 2, # VP β Manager β Rep
"Engineering Manager A": 1, # Manager β Engineer
"Engineering Manager B": 1, # Manager β Engineer
"Sales Manager": 1, # Manager β Rep
"Sales Rep 3": 0, # Leaf
"Senior Engineer 1": 0, # Leaf
"Senior Engineer 2": 0, # Leaf
"Junior Engineer 1": 0, # Leaf
"Sales Rep 1": 0, # Leaf
"Sales Rep 2": 0 # Leaf
}
# Verify: depth + height should vary, but max(depth + height) = tree height
for person in ["CEO", "VP Engineering", "Senior Engineer 1"]:
d = depths[person]
h = heights[person]
print(f"{person:25} depth={d}, height={h}, sum={d+h}")
Python Output:
CEO depth=0, height=3, sum=3
VP Engineering depth=1, height=2, sum=3
Senior Engineer 1 depth=3, height=0, sum=3
Notice that depth + height equals the tree height (3) for nodes on the longest path from root to leaf. This is a useful property for validation.
Binary Trees vs General Trees
Binary Trees vs General Trees
Trees come in two fundamental varieties, each suited to different problems.
General Trees (N-ary Trees)
A general tree allows each node to have any number of children (0, 1, 2, 3, ..., n).
Our organization chart is a general tree: - CEO has 2 children (VPs) - VP Engineering has 2 children (Managers) - Engineering Manager A has 2 children (Engineers) - Sales Manager has 2 children (Reps) - VP Sales has 2 children (Manager and Rep)
Use cases for general trees: - File systems (folders can contain any number of files/subfolders) - Organization charts (managers can have any number of reports) - XML/HTML DOM (elements can have any number of children) - Category hierarchies (categories can have any number of subcategories)
Binary Trees
A binary tree restricts each node to at most 2 children, conventionally called left and right.
[5]
/ \
[3] [8]
/ \ / \
[1] [4][7] [9]
Key properties: - Each node has 0, 1, or 2 children - Children are distinguished as "left" and "right" (order matters) - Even if a node has only one child, we specify whether it's left or right
Use cases for binary trees: - Binary search trees (efficient searching/sorting) - Expression trees (mathematical expressions) - Huffman coding trees (data compression) - Decision trees (machine learning)
Structural Differences
Let's see how the same data looks in both structures:
# General tree: representing a simple hierarchy
# Each node can have any number of children
general_tree_example = {
"value": "Root",
"children": [
{
"value": "Child 1",
"children": [
{"value": "Grandchild 1", "children": []},
{"value": "Grandchild 2", "children": []},
{"value": "Grandchild 3", "children": []} # 3 children!
]
},
{
"value": "Child 2",
"children": [
{"value": "Grandchild 4", "children": []}
]
},
{
"value": "Child 3",
"children": []
}
]
}
# Binary tree: same data forced into binary structure
# Each node has at most 2 children (left and right)
binary_tree_example = {
"value": "Root",
"left": {
"value": "Child 1",
"left": {
"value": "Grandchild 1",
"left": None,
"right": None
},
"right": {
"value": "Grandchild 2",
"left": None,
"right": {
"value": "Grandchild 3", # Had to chain this
"left": None,
"right": None
}
}
},
"right": {
"value": "Child 2",
"left": {
"value": "Grandchild 4",
"left": None,
"right": None
},
"right": {
"value": "Child 3",
"left": None,
"right": None
}
}
}
print("General tree allows 3 grandchildren under Child 1")
print("Binary tree must chain or restructure them")
Python Output:
General tree allows 3 grandchildren under Child 1
Binary tree must chain or restructure them
When to Choose Each Type
Choose general trees when: - The number of children varies naturally (file systems, org charts) - You're modeling real-world hierarchies - Flexibility is more important than structure
Choose binary trees when: - You need efficient searching (binary search trees) - You're implementing specific algorithms (heaps, expression evaluation) - The two-child constraint provides algorithmic advantages
For the rest of this chapter, we'll focus on general trees since they're more intuitive for learning recursion. Binary trees get special attention in later chapters due to their algorithmic importance.
Tree Representation in Python: Building a Proper Node Class
Tree Representation in Python: Building a Proper Node Class
Now let's fix our organization chart by building a proper tree structure. We'll create a TreeNode class that cleanly separates data from structure.
Iteration 1: Basic Node Structure
class TreeNode:
"""A node in a general tree (n-ary tree)"""
def __init__(self, value):
self.value = value # The data stored in this node
self.children = [] # List of child nodes
def add_child(self, child_node):
"""Add a child to this node"""
self.children.append(child_node)
def __repr__(self):
"""String representation for debugging"""
return f"TreeNode({self.value})"
# Build our organization chart properly
ceo = TreeNode("CEO")
vp_eng = TreeNode("VP Engineering")
vp_sales = TreeNode("VP Sales")
ceo.add_child(vp_eng)
ceo.add_child(vp_sales)
eng_mgr_a = TreeNode("Engineering Manager A")
eng_mgr_b = TreeNode("Engineering Manager B")
vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)
se1 = TreeNode("Senior Engineer 1")
se2 = TreeNode("Senior Engineer 2")
eng_mgr_a.add_child(se1)
eng_mgr_a.add_child(se2)
je1 = TreeNode("Junior Engineer 1")
eng_mgr_b.add_child(je1)
sales_mgr = TreeNode("Sales Manager")
sr3 = TreeNode("Sales Rep 3")
vp_sales.add_child(sales_mgr)
vp_sales.add_child(sr3)
sr1 = TreeNode("Sales Rep 1")
sr2 = TreeNode("Sales Rep 2")
sales_mgr.add_child(sr1)
sales_mgr.add_child(sr2)
print(f"Root: {ceo}")
print(f"Root's children: {ceo.children}")
print(f"VP Engineering's children: {vp_eng.children}")
Python Output:
Root: TreeNode(CEO)
Root's children: [TreeNode(VP Engineering), TreeNode(VP Sales)]
VP Engineering's children: [TreeNode(Engineering Manager A), TreeNode(Engineering Manager B)]
Now let's rewrite our functions to work with this structure:
def count_employees(node):
"""Count all employees in the tree rooted at node"""
count = 1 # Count this node
for child in node.children:
count += count_employees(child)
return count
def find_person(node, target_name):
"""Find if a person exists in the tree"""
if node.value == target_name: # Check the root!
return True
for child in node.children:
if find_person(child, target_name):
return True
return False
def max_depth(node):
"""Find the height of the tree (longest path to leaf)"""
if not node.children: # Leaf node
return 0
max_child_depth = 0
for child in node.children:
child_depth = max_depth(child)
max_child_depth = max(max_child_depth, child_depth)
return 1 + max_child_depth
print(f"Total employees: {count_employees(ceo)}")
print(f"Is 'CEO' in org? {find_person(ceo, 'CEO')}")
print(f"Is 'Senior Engineer 1' in org? {find_person(ceo, 'Senior Engineer 1')}")
print(f"Tree height: {max_depth(ceo)}")
Python Output:
Total employees: 11
Is 'CEO' in org? True
Is 'Senior Engineer 1' in org? True
Tree height: 3
Improvement: Now all our functions work correctly! The CEO is found, and the structure is much clearer.
What We Gained
Compare the old dictionary approach to the new TreeNode approach:
Old (Dictionary):
org_chart = {
"CEO": {
"VP Engineering": {...}
}
}
- Root is implicit (the outer dictionary key)
- Can't easily add metadata (hire date, salary, etc.)
- Awkward to traverse
- Mixing data with structure
New (TreeNode):
ceo = TreeNode("CEO")
vp_eng = TreeNode("VP Engineering")
ceo.add_child(vp_eng)
- Root is explicit (the
ceovariable) - Easy to add metadata (just add attributes to
TreeNode) - Natural to traverse (follow
.children) - Clean separation of data and structure
Iteration 2: Adding Metadata
Let's enhance our TreeNode to store more information:
class TreeNode:
"""Enhanced node with metadata"""
def __init__(self, name, title=None, hire_date=None):
self.name = name
self.title = title
self.hire_date = hire_date
self.children = []
def add_child(self, child_node):
self.children.append(child_node)
def __repr__(self):
return f"TreeNode({self.name}, {self.title})"
# Rebuild with metadata
ceo = TreeNode("Alice Smith", "Chief Executive Officer", "2015-01-01")
vp_eng = TreeNode("Bob Jones", "VP of Engineering", "2016-03-15")
vp_sales = TreeNode("Carol White", "VP of Sales", "2016-06-01")
ceo.add_child(vp_eng)
ceo.add_child(vp_sales)
eng_mgr_a = TreeNode("Dave Brown", "Engineering Manager", "2017-02-01")
eng_mgr_b = TreeNode("Eve Davis", "Engineering Manager", "2018-01-15")
vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)
# Now we can write richer queries
def find_by_title(node, target_title):
"""Find all employees with a given title"""
results = []
if node.title == target_title:
results.append(node.name)
for child in node.children:
results.extend(find_by_title(child, target_title))
return results
def print_org_chart(node, indent=0):
"""Pretty-print the organization chart"""
print(" " * indent + f"ββ {node.name} ({node.title})")
for child in node.children:
print_org_chart(child, indent + 1)
print("Engineering Managers:")
print(find_by_title(ceo, "Engineering Manager"))
print("\nOrganization Chart:")
print_org_chart(ceo)
Python Output:
Engineering Managers:
['Dave Brown', 'Eve Davis']
Organization Chart:
ββ Alice Smith (Chief Executive Officer)
ββ Bob Jones (VP of Engineering)
ββ Dave Brown (Engineering Manager)
ββ Eve Davis (Engineering Manager)
ββ Carol White (VP of Sales)
Iteration 3: Adding Parent References
Sometimes we need to traverse upward in the tree (from child to parent). Let's add that capability:
class TreeNode:
"""Node with bidirectional links"""
def __init__(self, name, title=None):
self.name = name
self.title = title
self.parent = None # Reference to parent node
self.children = []
def add_child(self, child_node):
self.children.append(child_node)
child_node.parent = self # Set the parent reference
def get_ancestors(self):
"""Get all ancestors from this node to root"""
ancestors = []
current = self.parent
while current is not None:
ancestors.append(current.name)
current = current.parent
return ancestors
def get_depth(self):
"""Calculate depth (distance from root)"""
depth = 0
current = self.parent
while current is not None:
depth += 1
current = current.parent
return depth
def __repr__(self):
return f"TreeNode({self.name})"
# Rebuild with parent references
ceo = TreeNode("Alice Smith", "CEO")
vp_eng = TreeNode("Bob Jones", "VP Engineering")
vp_sales = TreeNode("Carol White", "VP Sales")
ceo.add_child(vp_eng)
ceo.add_child(vp_sales)
eng_mgr_a = TreeNode("Dave Brown", "Engineering Manager")
vp_eng.add_child(eng_mgr_a)
se1 = TreeNode("Frank Miller", "Senior Engineer")
eng_mgr_a.add_child(se1)
# Now we can traverse upward
print(f"{se1.name}'s reporting chain: {se1.get_ancestors()}")
print(f"{se1.name}'s depth: {se1.get_depth()}")
print(f"{vp_eng.name}'s depth: {vp_eng.get_depth()}")
print(f"{ceo.name}'s depth: {ceo.get_depth()}")
Python Output:
Frank Miller's reporting chain: ['Dave Brown', 'Bob Jones', 'Alice Smith']
Frank Miller's depth: 3
Bob Jones's depth: 1
Alice Smith's depth: 0
When to use parent references: - When you need to traverse upward frequently - When calculating paths between nodes - When implementing certain tree algorithms (like lowest common ancestor)
Trade-off: Parent references add memory overhead and complexity (must maintain consistency when modifying the tree).
Binary Tree Representation
Binary Tree Representation
Binary trees deserve their own class since they have a different structure (left/right instead of a list of children):
class BinaryTreeNode:
"""A node in a binary tree"""
def __init__(self, value):
self.value = value
self.left = None # Left child
self.right = None # Right child
def __repr__(self):
return f"BinaryTreeNode({self.value})"
# Build a simple binary tree
# 5
# / \
# 3 8
# / \
# 1 4
root = BinaryTreeNode(5)
root.left = BinaryTreeNode(3)
root.right = BinaryTreeNode(8)
root.left.left = BinaryTreeNode(1)
root.left.right = BinaryTreeNode(4)
def count_nodes(node):
"""Count all nodes in a binary tree"""
if node is None:
return 0
return 1 + count_nodes(node.left) + count_nodes(node.right)
def find_value(node, target):
"""Find if a value exists in the binary tree"""
if node is None:
return False
if node.value == target:
return True
return find_value(node.left, target) or find_value(node.right, target)
def tree_height(node):
"""Calculate height of binary tree"""
if node is None:
return -1 # Height of empty tree is -1 by convention
left_height = tree_height(node.left)
right_height = tree_height(node.right)
return 1 + max(left_height, right_height)
print(f"Total nodes: {count_nodes(root)}")
print(f"Contains 4? {find_value(root, 4)}")
print(f"Contains 7? {find_value(root, 7)}")
print(f"Tree height: {tree_height(root)}")
Python Output:
Total nodes: 5
Contains 4? True
Contains 7? False
Tree height: 2
Binary Tree vs General Tree: Code Comparison
Notice the pattern differences:
General Tree (n-ary):
def process_tree(node):
# Process this node
result = process(node.value)
# Recurse on ALL children
for child in node.children:
result = combine(result, process_tree(child))
return result
Binary Tree:
def process_tree(node):
if node is None:
return base_case
# Process this node
result = process(node.value)
# Recurse on EXACTLY TWO children
left_result = process_tree(node.left)
right_result = process_tree(node.right)
return combine(result, left_result, right_result)
The binary tree version is more explicit about the two recursive calls, while the general tree version uses a loop to handle variable numbers of children.
Visualizing Tree Execution
Visualizing Tree Execution
Let's trace how recursion actually executes on our tree structures. Understanding the call stack for tree recursion is crucial.
Manual Trace: Counting Nodes
Let's trace count_nodes() on this small tree:
A
/ \
B C
/
D
# Build the tree
a = BinaryTreeNode("A")
a.left = BinaryTreeNode("B")
a.right = BinaryTreeNode("C")
a.left.left = BinaryTreeNode("D")
def count_nodes_verbose(node, indent=0):
"""Count nodes with execution trace"""
prefix = " " * indent
if node is None:
print(f"{prefix}count_nodes(None) β 0")
return 0
print(f"{prefix}count_nodes({node.value})")
print(f"{prefix} β Counting left subtree...")
left_count = count_nodes_verbose(node.left, indent + 2)
print(f"{prefix} β Counting right subtree...")
right_count = count_nodes_verbose(node.right, indent + 2)
total = 1 + left_count + right_count
print(f"{prefix} β Total for {node.value}: 1 + {left_count} + {right_count} = {total}")
return total
result = count_nodes_verbose(a)
print(f"\nFinal result: {result}")
Python Output:
count_nodes(A)
β Counting left subtree...
count_nodes(B)
β Counting left subtree...
count_nodes(D)
β Counting left subtree...
count_nodes(None) β 0
β Counting right subtree...
count_nodes(None) β 0
β Total for D: 1 + 0 + 0 = 1
β Counting right subtree...
count_nodes(None) β 0
β Total for B: 1 + 1 + 0 = 2
β Counting right subtree...
count_nodes(C)
β Counting left subtree...
count_nodes(None) β 0
β Counting right subtree...
count_nodes(None) β 0
β Total for C: 1 + 0 + 0 = 1
β Total for A: 1 + 2 + 1 = 4
Final result: 4
Execution Trace Analysis
Let's parse this execution section by section:
- The recursion pattern:
- Process current node
- Recurse left (goes deep before returning)
- Recurse right (only after left completes)
-
Combine results
-
The call stack depth:
- Maximum depth: 3 (A β B β D β None)
- This equals the tree height + 1
-
Each level of indentation represents one stack frame
-
The order of execution:
- Depth-first: we go all the way down the left side before touching the right
-
Post-order: we process children before parents (counts bubble up)
-
The base case:
Nonenodes return 0 immediately- These are the leaves of the recursion tree
- Without this base case, we'd get
AttributeError: 'NoneType' object has no attribute 'left'
Recursion Tree Visualization
The recursion tree (not the data tree!) shows all function calls:
count_nodes(A)
/ \
count_nodes(B) count_nodes(C)
/ \ / \
count_nodes(D) count_nodes(None) count_nodes(None) count_nodes(None)
/ \
count_nodes(None) count_nodes(None)
Each node in this recursion tree represents one function call. The leaves are all base cases (None nodes).
Key insight: For a binary tree with n nodes, we make exactly 2n+1 function calls (n for actual nodes, n+1 for None children).
Common Tree Operations: The Essential Toolkit
Common Tree Operations: The Essential Toolkit
Now that we have proper tree structures, let's implement the fundamental operations you'll use repeatedly. These form the foundation for all tree algorithms.
Operation 1: Tree Size (Node Count)
We've seen this, but let's formalize it:
def size(node):
"""
Count total nodes in tree.
Base case: Empty tree has size 0
Recursive case: 1 (this node) + size of all subtrees
"""
if node is None:
return 0
# For binary tree:
return 1 + size(node.left) + size(node.right)
def size_general(node):
"""Size for general (n-ary) tree"""
if node is None:
return 0
count = 1 # This node
for child in node.children:
count += size_general(child)
return count
Complexity Analysis: - Time: O(n) - visit each node exactly once - Space: O(h) - call stack depth equals tree height h - Recurrence: T(n) = T(left) + T(right) + O(1)
Operation 2: Tree Height (Maximum Depth)
def height(node):
"""
Calculate height of tree (longest path to leaf).
Base case: Empty tree has height -1 (by convention)
Leaf node has height 0
Recursive case: 1 + max height of subtrees
"""
if node is None:
return -1
left_height = height(node.left)
right_height = height(node.right)
return 1 + max(left_height, right_height)
def height_general(node):
"""Height for general tree"""
if node is None:
return -1
if not node.children: # Leaf node
return 0
max_child_height = -1
for child in node.children:
child_height = height_general(child)
max_child_height = max(max_child_height, child_height)
return 1 + max_child_height
# Test with our tree
# A
# / \
# B C
# /
# D
a = BinaryTreeNode("A")
a.left = BinaryTreeNode("B")
a.right = BinaryTreeNode("C")
a.left.left = BinaryTreeNode("D")
print(f"Tree height: {height(a)}") # Should be 2
print(f"Height of B subtree: {height(a.left)}") # Should be 1
print(f"Height of C subtree: {height(a.right)}") # Should be 0
Python Output:
Tree height: 2
Height of B subtree: 1
Height of C subtree: 0
Complexity Analysis: - Time: O(n) - must visit all nodes to find longest path - Space: O(h) - call stack depth - Recurrence: T(n) = T(left) + T(right) + O(1)
Operation 3: Search (Find Value)
def contains(node, target):
"""
Check if value exists in tree.
Base case: Empty tree doesn't contain anything
Recursive case: Check this node, then check all subtrees
"""
if node is None:
return False
if node.value == target:
return True
# For binary tree: check both subtrees
return contains(node.left, target) or contains(node.right, target)
def contains_general(node, target):
"""Search in general tree"""
if node is None:
return False
if node.value == target:
return True
# Check all children
for child in node.children:
if contains_general(child, target):
return True
return False
# Test
print(f"Contains 'B'? {contains(a, 'B')}") # True
print(f"Contains 'D'? {contains(a, 'D')}") # True
print(f"Contains 'Z'? {contains(a, 'Z')}") # False
Python Output:
Contains 'B'? True
Contains 'D'? True
Contains 'Z'? False
Complexity Analysis: - Time: O(n) worst case - might need to check all nodes - Space: O(h) - call stack depth - Best case: O(1) if target is at root - Average case: O(n) for unordered tree
Operation 4: Leaf Count
def count_leaves(node):
"""
Count leaf nodes (nodes with no children).
Base case: Empty tree has 0 leaves
Leaf case: Node with no children is 1 leaf
Recursive case: Sum leaves in all subtrees
"""
if node is None:
return 0
# Check if this is a leaf
if node.left is None and node.right is None:
return 1
# Not a leaf, count leaves in subtrees
return count_leaves(node.left) + count_leaves(node.right)
def count_leaves_general(node):
"""Count leaves in general tree"""
if node is None:
return 0
if not node.children: # Leaf node
return 1
leaf_count = 0
for child in node.children:
leaf_count += count_leaves_general(child)
return leaf_count
# Test
print(f"Number of leaves: {count_leaves(a)}") # C and D are leaves = 2
Python Output:
Number of leaves: 2
Operation 5: Print All Values (Tree Traversal Preview)
def print_all_values(node, indent=0):
"""
Print all values with indentation showing structure.
This is a preorder traversal (node before children).
"""
if node is None:
return
print(" " * indent + str(node.value))
print_all_values(node.left, indent + 1)
print_all_values(node.right, indent + 1)
def print_all_values_general(node, indent=0):
"""Print all values in general tree"""
if node is None:
return
print(" " * indent + str(node.value))
for child in node.children:
print_all_values_general(child, indent + 1)
print("Tree structure:")
print_all_values(a)
Python Output:
Tree structure:
A
B
D
C
This gives us a visual representation of the tree structure through indentation.
Lab: Build a Simple Tree Class
Lab: Build a Simple Tree Class
Now it's your turn to synthesize everything we've learned. You'll build a complete, production-ready tree class with all the operations we've discussed.
Lab Objectives
- Create a
TreeNodeclass for general (n-ary) trees - Implement all fundamental operations as methods
- Add useful utility methods
- Test thoroughly with real data
Starter Code
Here's the skeleton to complete:
class TreeNode:
"""
A node in a general (n-ary) tree.
Each node stores a value and maintains a list of children.
Optionally maintains a parent reference for upward traversal.
"""
def __init__(self, value, maintain_parent_refs=False):
"""
Initialize a tree node.
Args:
value: The data to store in this node
maintain_parent_refs: If True, maintain parent pointers
"""
self.value = value
self.children = []
self.parent = None if maintain_parent_refs else None
self._maintain_parents = maintain_parent_refs
def add_child(self, child_node):
"""
Add a child to this node.
Args:
child_node: TreeNode to add as child
"""
self.children.append(child_node)
if self._maintain_parents:
child_node.parent = self
def remove_child(self, child_node):
"""
Remove a child from this node.
Args:
child_node: TreeNode to remove
Returns:
True if child was found and removed, False otherwise
"""
try:
self.children.remove(child_node)
if self._maintain_parents:
child_node.parent = None
return True
except ValueError:
return False
def size(self):
"""
Count total nodes in subtree rooted at this node.
Returns:
Integer count of nodes
"""
count = 1 # Count this node
for child in self.children:
count += child.size()
return count
def height(self):
"""
Calculate height of subtree rooted at this node.
Height is the longest path from this node to any leaf.
Returns:
Integer height (0 for leaf nodes)
"""
if not self.children: # Leaf node
return 0
max_child_height = 0
for child in self.children:
child_height = child.height()
max_child_height = max(max_child_height, child_height)
return 1 + max_child_height
def depth(self):
"""
Calculate depth of this node (distance from root).
Only works if parent references are maintained.
Returns:
Integer depth (0 for root)
Raises:
RuntimeError if parent references not maintained
"""
if not self._maintain_parents:
raise RuntimeError("depth() requires maintain_parent_refs=True")
depth = 0
current = self.parent
while current is not None:
depth += 1
current = current.parent
return depth
def is_leaf(self):
"""Check if this node is a leaf (has no children)"""
return len(self.children) == 0
def is_root(self):
"""
Check if this node is the root (has no parent).
Only works if parent references are maintained.
"""
if not self._maintain_parents:
raise RuntimeError("is_root() requires maintain_parent_refs=True")
return self.parent is None
def count_leaves(self):
"""Count leaf nodes in subtree rooted at this node"""
if self.is_leaf():
return 1
leaf_count = 0
for child in self.children:
leaf_count += child.count_leaves()
return leaf_count
def contains(self, target_value):
"""
Check if target value exists in subtree.
Args:
target_value: Value to search for
Returns:
True if found, False otherwise
"""
if self.value == target_value:
return True
for child in self.children:
if child.contains(target_value):
return True
return False
def find(self, target_value):
"""
Find and return the node with target value.
Args:
target_value: Value to search for
Returns:
TreeNode if found, None otherwise
"""
if self.value == target_value:
return self
for child in self.children:
result = child.find(target_value)
if result is not None:
return result
return None
def get_ancestors(self):
"""
Get list of ancestor values from parent to root.
Only works if parent references are maintained.
Returns:
List of values from parent to root
"""
if not self._maintain_parents:
raise RuntimeError("get_ancestors() requires maintain_parent_refs=True")
ancestors = []
current = self.parent
while current is not None:
ancestors.append(current.value)
current = current.parent
return ancestors
def print_tree(self, indent=0):
"""
Pretty-print the tree structure.
Args:
indent: Current indentation level (used internally)
"""
print(" " * indent + f"ββ {self.value}")
for child in self.children:
child.print_tree(indent + 1)
def __repr__(self):
"""String representation for debugging"""
return f"TreeNode({self.value})"
def __str__(self):
"""Human-readable string representation"""
return str(self.value)
Lab Exercise 1: Build an Organization Chart
Use the TreeNode class to build a complete organization chart:
# Build the organization
ceo = TreeNode("Alice (CEO)", maintain_parent_refs=True)
vp_eng = TreeNode("Bob (VP Engineering)")
vp_sales = TreeNode("Carol (VP Sales)")
vp_ops = TreeNode("David (VP Operations)")
ceo.add_child(vp_eng)
ceo.add_child(vp_sales)
ceo.add_child(vp_ops)
# Engineering department
eng_mgr_a = TreeNode("Eve (Eng Manager)")
eng_mgr_b = TreeNode("Frank (Eng Manager)")
vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)
se1 = TreeNode("Grace (Senior Engineer)")
se2 = TreeNode("Henry (Senior Engineer)")
je1 = TreeNode("Iris (Junior Engineer)")
eng_mgr_a.add_child(se1)
eng_mgr_a.add_child(se2)
eng_mgr_b.add_child(je1)
# Sales department
sales_mgr = TreeNode("Jack (Sales Manager)")
vp_sales.add_child(sales_mgr)
sr1 = TreeNode("Kelly (Sales Rep)")
sr2 = TreeNode("Leo (Sales Rep)")
sales_mgr.add_child(sr1)
sales_mgr.add_child(sr2)
# Operations department
ops_mgr = TreeNode("Mia (Ops Manager)")
vp_ops.add_child(ops_mgr)
# Test all operations
print("=== Organization Chart ===")
ceo.print_tree()
print(f"\n=== Statistics ===")
print(f"Total employees: {ceo.size()}")
print(f"Organization depth: {ceo.height()}")
print(f"Number of individual contributors: {ceo.count_leaves()}")
print(f"\n=== Search Operations ===")
print(f"Is Grace in the org? {ceo.contains('Grace (Senior Engineer)')}")
print(f"Is Zeus in the org? {ceo.contains('Zeus')}")
grace_node = ceo.find("Grace (Senior Engineer)")
if grace_node:
print(f"\nGrace's reporting chain: {grace_node.get_ancestors()}")
print(f"Grace's depth in org: {grace_node.depth()}")
print(f"\n=== Department Sizes ===")
print(f"Engineering department size: {vp_eng.size()}")
print(f"Sales department size: {vp_sales.size()}")
print(f"Operations department size: {vp_ops.size()}")
Python Output:
=== Organization Chart ===
ββ Alice (CEO)
ββ Bob (VP Engineering)
ββ Eve (Eng Manager)
ββ Grace (Senior Engineer)
ββ Henry (Senior Engineer)
ββ Frank (Eng Manager)
ββ Iris (Junior Engineer)
ββ Carol (VP Sales)
ββ Jack (Sales Manager)
ββ Kelly (Sales Rep)
ββ Leo (Sales Rep)
ββ David (VP Operations)
ββ Mia (Ops Manager)
=== Statistics ===
Total employees: 13
Organization depth: 3
Number of individual contributors: 6
=== Search Operations ===
Is Grace in the org? True
Is Zeus in the org? False
Grace's reporting chain: ['Eve (Eng Manager)', 'Bob (VP Engineering)', 'Alice (CEO)']
Grace's depth in org: 3
=== Department Sizes ===
Engineering department size: 6
Sales department size: 4
Operations department size: 2
Lab Exercise 2: File System Tree
Build a file system representation:
# Build a file system tree
root = TreeNode("/")
home = TreeNode("home")
root.add_child(home)
user = TreeNode("user")
home.add_child(user)
documents = TreeNode("documents")
downloads = TreeNode("downloads")
user.add_child(documents)
user.add_child(downloads)
doc1 = TreeNode("report.pdf")
doc2 = TreeNode("notes.txt")
documents.add_child(doc1)
documents.add_child(doc2)
file1 = TreeNode("image.jpg")
file2 = TreeNode("video.mp4")
downloads.add_child(file1)
downloads.add_child(file2)
etc = TreeNode("etc")
root.add_child(etc)
config = TreeNode("config.ini")
etc.add_child(config)
print("=== File System Structure ===")
root.print_tree()
print(f"\n=== File System Statistics ===")
print(f"Total items: {root.size()}")
print(f"Directory depth: {root.height()}")
print(f"Number of files (leaves): {root.count_leaves()}")
print(f"\n=== Search Operations ===")
print(f"Contains 'report.pdf'? {root.contains('report.pdf')}")
print(f"Contains 'missing.txt'? {root.contains('missing.txt')}")
print(f"\n=== Directory Sizes ===")
print(f"Items in /home/user: {user.size()}")
print(f"Items in /home/user/documents: {documents.size()}")
Python Output:
=== File System Structure ===
ββ /
ββ home
ββ user
ββ documents
ββ report.pdf
ββ notes.txt
ββ downloads
ββ image.jpg
ββ video.mp4
ββ etc
ββ config.ini
=== File System Statistics ===
Total items: 11
Directory depth: 4
Number of files (leaves): 5
=== Search Operations ===
Contains 'report.pdf'? True
Contains 'missing.txt'? False
=== Directory Sizes ===
Items in /home/user: 7
Items in /home/user/documents: 3
Lab Exercise 3: Challenge Problems
Now try these on your own:
-
Add a
get_path()method that returns the full path from root to a node (e.g., "/home/user/documents/report.pdf") -
Add a
find_all()method that returns a list of ALL nodes matching a value (not just the first one) -
Add a
level_order_values()method that returns values level by level (breadth-first) -
Add a
max_value()method that finds the maximum value in the tree (assuming numeric values) -
Add a
prune()method that removes all leaf nodes with a specific value
Here's a template for the challenge methods:
class TreeNode:
# ... (previous methods) ...
def get_path(self):
"""
Get the full path from root to this node.
Requires parent references.
Returns:
String path like "/home/user/documents"
"""
if not self._maintain_parents:
raise RuntimeError("get_path() requires maintain_parent_refs=True")
# YOUR CODE HERE
# Hint: Build path by traversing up to root, then reverse
pass
def find_all(self, target_value):
"""
Find all nodes with target value.
Args:
target_value: Value to search for
Returns:
List of TreeNode objects
"""
# YOUR CODE HERE
# Hint: Accumulate results from this node and all children
pass
def level_order_values(self):
"""
Get values level by level (breadth-first).
Returns:
List of lists, where each inner list is one level
"""
# YOUR CODE HERE
# Hint: Use a queue to track nodes at each level
pass
def max_value(self):
"""
Find maximum value in tree (assumes numeric values).
Returns:
Maximum value found
"""
# YOUR CODE HERE
# Hint: Compare this node's value with max of all children
pass
def prune(self, target_value):
"""
Remove all leaf nodes with target value.
Args:
target_value: Value of leaves to remove
Returns:
Number of nodes removed
"""
# YOUR CODE HERE
# Hint: Recursively prune children first, then check if this became a leaf
pass
# Test your implementations here
Solutions to Challenge Problems
Here are reference implementations:
class TreeNode:
# ... (previous methods) ...
def get_path(self):
"""Get full path from root to this node"""
if not self._maintain_parents:
raise RuntimeError("get_path() requires maintain_parent_refs=True")
path_parts = [str(self.value)]
current = self.parent
while current is not None:
path_parts.append(str(current.value))
current = current.parent
path_parts.reverse()
return "/" + "/".join(path_parts)
def find_all(self, target_value):
"""Find all nodes with target value"""
results = []
if self.value == target_value:
results.append(self)
for child in self.children:
results.extend(child.find_all(target_value))
return results
def level_order_values(self):
"""Get values level by level"""
if not self:
return []
result = []
current_level = [self]
while current_level:
level_values = [node.value for node in current_level]
result.append(level_values)
next_level = []
for node in current_level:
next_level.extend(node.children)
current_level = next_level
return result
def max_value(self):
"""Find maximum value (assumes numeric values)"""
max_val = self.value
for child in self.children:
child_max = child.max_value()
max_val = max(max_val, child_max)
return max_val
def prune(self, target_value):
"""Remove all leaf nodes with target value"""
removed_count = 0
# First, recursively prune all children
for child in self.children[:]: # Copy list to avoid modification during iteration
removed_count += child.prune(target_value)
# Then remove children that became leaves with target value
for child in self.children[:]:
if child.is_leaf() and child.value == target_value:
self.remove_child(child)
removed_count += 1
return removed_count
# Test the solutions
test_tree = TreeNode(10)
test_tree.add_child(TreeNode(5))
test_tree.add_child(TreeNode(15))
test_tree.children[0].add_child(TreeNode(3))
test_tree.children[0].add_child(TreeNode(7))
test_tree.children[1].add_child(TreeNode(12))
test_tree.children[1].add_child(TreeNode(20))
print("Level order:", test_tree.level_order_values())
print("Max value:", test_tree.max_value())
# Test find_all with duplicate values
dup_tree = TreeNode(5)
dup_tree.add_child(TreeNode(3))
dup_tree.add_child(TreeNode(5))
dup_tree.children[0].add_child(TreeNode(5))
print(f"Found {len(dup_tree.find_all(5))} nodes with value 5")
Python Output:
Level order: [[10], [5, 15], [3, 7, 12, 20]]
Max value: 20
Found 3 nodes with value 5
The Journey: From Problem to Solution
The Journey: From Problem to Solution
Let's review how we progressed from a problematic dictionary representation to a robust tree class:
| Iteration | Problem | Solution Applied | Result |
|---|---|---|---|
| 0 | Nested dicts, root not found | None | Awkward, buggy |
| 1 | Need proper structure | TreeNode class with children | Clean, works correctly |
| 2 | Need metadata storage | Add attributes to TreeNode | Flexible data storage |
| 3 | Need upward traversal | Add parent references | Bidirectional navigation |
| 4 | Need complete operations toolkit | Implement all standard methods | Production-ready class |
Final Implementation Summary
Our complete TreeNode class provides:
Core Structure: - Value storage - Children list (for n-ary trees) - Optional parent references
Fundamental Operations (O(n) time, O(h) space):
- size() - count all nodes
- height() - longest path to leaf
- depth() - distance from root
- contains() - search for value
- find() - return node with value
- count_leaves() - count leaf nodes
Utility Methods:
- is_leaf() - check if node is leaf
- is_root() - check if node is root
- get_ancestors() - path to root
- print_tree() - visual representation
Advanced Operations (challenge problems):
- get_path() - full path string
- find_all() - find all matching nodes
- level_order_values() - breadth-first traversal
- max_value() - find maximum
- prune() - remove matching leaves
Complexity Analysis
Time Complexity: - Most operations: O(n) where n = number of nodes - Must visit each node once in worst case - No operation is faster than O(h) where h = height
Space Complexity: - Call stack depth: O(h) for recursive operations - For balanced trees: h = O(log n) - For degenerate trees (linear): h = O(n)
Recurrence Relations: - Size: T(n) = T(cβ) + T(cβ) + ... + T(cβ) + O(1) - Height: T(n) = max(T(cβ), T(cβ), ..., T(cβ)) + O(1) - Search: T(n) = T(cβ) + T(cβ) + ... + T(cβ) + O(1)
Where cβ, cβ, ..., cβ are the children of the current node.
When to Use This Tree Structure
Choose general (n-ary) trees when: - Modeling real-world hierarchies (org charts, file systems) - Number of children varies naturally - Flexibility is more important than algorithmic constraints
Choose binary trees when: - Implementing search trees (BST, AVL, Red-Black) - Building heaps for priority queues - Representing mathematical expressions - The two-child constraint provides algorithmic advantages
Key Lessons Learned
-
Separate data from structure: Don't conflate the node's value with its relationships
-
Make the root explicit: The root should be a first-class object, not implicit in the container
-
Choose your invariants: Decide whether to maintain parent references based on your use case
-
Recursion is natural for trees: Every tree operation follows the pattern "process this node, recurse on children"
-
Base cases are crucial: Always handle empty trees (None) and leaf nodes explicitly
-
Visualization helps debugging: The
print_tree()method is invaluable for understanding structure -
Test with real data: Organization charts and file systems are excellent test cases because they expose edge cases naturally
Looking Ahead: Tree Traversals
Looking Ahead: Tree Traversals
We've built the foundationβa solid tree structure with basic operations. But we've only scratched the surface of what we can do with trees.
In the next chapter, we'll explore tree traversals: systematic ways to visit every node in a tree. You've already seen hints of this:
print_tree()visits nodes in a specific order (preorder)level_order_values()visits level by level (breadth-first)
But there are many more traversal patterns, each suited to different problems:
Depth-First Traversals: - Preorder: Process node before children (what we've been doing) - Postorder: Process children before node (useful for deletion, evaluation) - Inorder: Only for binary trees (used in BSTs for sorted output)
Breadth-First Traversal: - Level-order: Process all nodes at depth d before depth d+1
Each traversal pattern reveals different properties of the tree and enables different algorithms. Understanding when to use each traversal is key to mastering tree algorithms.
Preview: Why Traversals Matter
Consider these problems:
- Deleting a tree: Must delete children before parent (postorder)
- Evaluating an expression tree: Must evaluate operands before operator (postorder)
- Printing a sorted BST: Must visit left subtree, then node, then right subtree (inorder)
- Finding shortest path: Must explore level by level (breadth-first)
The traversal pattern you choose determines whether your algorithm is elegant or awkward, efficient or wasteful.
What You've Mastered
Before moving on, make sure you can:
β Explain the difference between general and binary trees
β Define: node, edge, root, parent, child, leaf, ancestor, descendant, depth, height
β Implement a TreeNode class with all fundamental operations
β Trace recursive execution on tree structures
β Calculate time and space complexity for tree operations
β Choose appropriate tree structures for different problems
If you can do all of these, you're ready for tree traversals. If not, review the sections above and work through the lab exercises again.
The journey continues in Chapter 18: Tree Traversals.